{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Custom learners in Orange\n",
    "Orange contains many learners which can be used to fit models for classification, regression and other tasks. But it is also very simple to write your own learner. To start, define a subclass of the `Orange.classification.Learner` base class and implement either one or both of the fit methods: `fit` works on data matrices represented as numpy arrays, while the more general `fit_storage` uses the encapsulating `Orange.data.Storage` object (or a subclass such as `Orange.data.Table`).\n",
    "After the necessary computations, the learner should produce a fitted model object, derived from the `Orange.classification.Model` base class, which needs to implement `predict` or `predict_storage`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Linear regression\n",
    "Linear regression is of course already available through `Orange.regression.LinearRegressionLearner`, which uses the implementation from scikit-learn. Here, we show a simpler version using normal equations to demonstrate how to write your own numpy-based learner from scratch."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We choose to implement the `fit` method in this example, since the least squares coefficients are easily computed with standard numpy operations on data matrices. Similarly, the model class implements the `predict` method, to make predictions of target values for new data instances."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "from numpy.linalg import pinv\n",
    "from Orange.classification import Learner, Model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "class LinearRegression(Learner):\n",
    "    def fit(self, X, Y, W=None):\n",
    "        coef = pinv(X.T.dot(X)).dot(X.T).dot(Y)\n",
    "        return LinearRegressionModel(coef)\n",
    "\n",
    "class LinearRegressionModel(Model):\n",
    "    def __init__(self, coef):\n",
    "        self.coef = coef\n",
    "\n",
    "    def predict(self, X):\n",
    "        return X.dot(self.coef)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that the above simplified version of linear regression does not fit the intercept and ignores instance weights.\n",
    "\n",
    "We can evaluate its performance with cross-validation on one of the data sets provided in Orange."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 9.20124355,  4.87903715,  5.09814721])"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import Orange\n",
    "housing = Orange.data.Table('housing')\n",
    "learners = [Orange.regression.MeanLearner(), Orange.regression.LinearRegressionLearner(), LinearRegression()]\n",
    "res = Orange.evaluation.CrossValidation(housing, learners)\n",
    "Orange.evaluation.RMSE(res)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We see that the error is much lower than predicting the mean value, but slightly higher than from the included `LinearRegressionLearner` from Orange using scikit-learn. Try adding a column of ones to the existing input features to allow model bias and check the improvement."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Wrapper\n",
    "Sometimes we want to add some additional functionality to an existing learner (or learners), which can be easily done with a wrapper class."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Suppose we wish to know how much time was spent to fit a model. The following wrapper uses an existing learner to fit the model, but measures the time spent and stores it in a special attribute.\n",
    "Because we do not need to manipulate the data matrices (`X`, `Y`, `W`) it is easier to implement the `fit_storage` method instead of `fit`, which differs only in its arguments and receives the data packed in a single `Orange.data.Storage` object."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from time import time"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "class TimedLearner(Learner):\n",
    "    def __init__(self, learner):\n",
    "        self.learner = learner\n",
    "        self.time = 0\n",
    "\n",
    "    def fit_storage(self, data):\n",
    "        t = time()\n",
    "        model = self.learner(data)\n",
    "        model.time = time() - t\n",
    "        self.time += model.time\n",
    "        return model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This time we did not need to write a Model class since we return the same model instance of the model as the base learner. An additional attribute `time` is added to the model containing the time in seconds used to fit it. This time is also added to the cumulative time used to fit all models and stored as an attribute of the learner."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.0021028518676757812\n",
      "0.0014808177947998047\n",
      "0.003583669662475586\n"
     ]
    }
   ],
   "source": [
    "tl = TimedLearner(Orange.regression.LinearRegressionLearner())\n",
    "m1 = tl(housing)\n",
    "print(m1.time)\n",
    "m2 = tl(housing)\n",
    "print(m2.time)\n",
    "print(tl.time)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Wrappers for scikit-learn methods\n",
    "Coming soon..."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.4.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
